/*==============================================================================
PART 2B: Deposit Cost Estimation
==============================================================================
Purpose:
This script estimates bank-specific deposit costs (insured, uninsured, and total)
based on regression analysis of non-interest expenses on various balance sheet
components, controlling for bank size and time fixed effects. The estimated
coefficients are then used to construct cost measures. Refer to the manuscript for details on methodology and assumptions.

Input:
- $path_clean/call_reports_forcostreg.dta (Prepared bank-quarter panel for deposit cost regression)

Output:
- A Stata dataset in panel form with newly created cost variables for each bank-quarter.
     +--------------------------------------------------+
     | rssdid       yq   cost_ins   cost_u~s   cost_tot |
     |--------------------------------------------------|
  1. |   5210   2015q4   1.634829   .5771257   1.556499 |
  2. |   5210   2016q1   1.634467    .624306   1.559064 |
  3. |   5210   2016q2   1.638065    .643885   1.558356 |
  4. |   5210   2016q3   1.642617   .6222965   1.560894 |
  5. |   5210   2016q4    1.47921   .6436817   1.410291 |
     |--------------------------------------------------|
  6. |   5210   2017q1   1.644852    .649159   1.559013 |
  7. |   5210   2017q2   1.480488    .665238   1.405775 |
  8. |   5210   2017q3   1.636731   .5196878   1.543037 |
  9. |   5210   2017q4   1.637138   .5637863   1.535118 |
 10. |   5210   2018q1   1.637021   .5673229   1.536631 |
............. continued

Methodology:
1. Load and prepare the bank-quarter panel data.
2. Define bank size quartiles based on total assets.
3. Estimate regressions of net non-interest expense on various balance sheet
   items (scaled by assets), including interactions with size quartiles and
   absorbing time fixed effects.
4. Use the estimated coefficients from the final regression model to calculate
   insured and uninsured deposit costs for each bank-quarter observation,
   conditional on the bank's size quartile.
5. Calculate the total deposit cost as a weighted average of insured and
   uninsured costs, using the uninsured share of domestic deposits as the weight.
6. Handle missing cost values by replacing them with the sample mean.

Last updated: Aug 9, 2025
==============================================================================*/

display "--- Starting Part 2B: Deposit Cost Estimation ---" // Indicate the start of the script

clear all // Clear memory of any previous dataset
use "$path_clean/call_reports_forcostreg.dta", clear // Load the prepared bank-quarter panel data

sort rssdid yq
gen asset_g = s1.assets / l1.assets
drop if ((!missing(asset_g) & asset_g <= -0.5) | (!missing(asset_g) & asset_g >= 0.5)) & yq==yq(2022,4)

sort rssdid yq
tsset rssdid yq // Declare data as panel

//===============================================================================
// Step 1: Identify Size Bins
//===============================================================================
// Purpose: Create bank size quartiles (sizeq) based on the natural logarithm
// of total assets (lnassets) for each quarter (by(yq)). These quartiles are
// used as categorical variables and for interactions in the regression analysis.
gen lnassets = ln(assets) // Calculate natural logarithm of total assets

// Create size quartiles (nq(4)) based on lnassets, calculated separately for each quarter
egen sizeq = xtile(lnassets), nq(4) by(yq)

//===============================================================================
//Step 2: Estimate Deposit Cost Regressions
//==============================================================================
//Purpose: Estimate the relationship between net non-interest expense (scaled
//by assets) and various balance sheet components (also scaled by assets),
//controlling for bank size quartiles (i.sizeq) and time fixed effects (absorb(yq)).
//The coefficients from these regressions, particularly those related to deposit
//categories and their interactions with size, are used to calculate deposit costs.
//The regressions are run on data from 2015 Q4 to 2019 Q4.
//The regression results are stored for use in generating Table 3, Panel A.

eststo clear // Clear any previously stored estimation results

// Model 1: Baseline regression with core, foreign, and OBM deposits, and other balance sheet items.
// Uses size quartiles as main effects.
quietly reghdfe net_nonintexp_assets coredep_assets foreigndep_assets obm_assets equity_assets tradingliabilities_assets otherliabilities_assets loans_assets tradingassets_assets otherassets_assets i.sizeq if yq>=yq(2015,4) & yq<=yq(2019,4), cluster(rssdid) absorb(yq)
estimates store reg1 // Store results for Model 1
estimates save "$path_temp/deposit_cost_reg1.ster", replace
estimates drop reg1

// Model 2: Adds uninsured (uninszm_assets) and insured (inszm_assets, stdep_assets) deposit categories.
quietly reghdfe net_nonintexp_assets uninszm_assets inszm_assets stdep_assets foreigndep_assets obm_assets equity_assets tradingliabilities_assets otherliabilities_assets loans_assets tradingassets_assets otherassets_assets i.sizeq if yq>=yq(2015,4) & yq<=yq(2019,4), cluster(rssdid) absorb(yq)
estimates store reg2 // Store results for Model 2
estimates save "$path_temp/deposit_cost_reg2.ster", replace
estimates drop reg2

// Model 3: Introduces interaction between size quartiles and short-term deposits (stdep_assets).
quietly reghdfe net_nonintexp_assets uninszm_assets inszm_assets i.sizeq#c.stdep_assets foreigndep_assets obm_assets equity_assets tradingliabilities_assets otherliabilities_assets loans_assets tradingassets_assets otherassets_assets i.sizeq if yq>=yq(2015,4) & yq<=yq(2019,4), cluster(rssdid) absorb(yq)
estimates store reg3 // Store results for Model 3
estimates save "$path_temp/deposit_cost_reg3.ster", replace
estimates drop reg3

// Model 4: Introduces interactions between size quartiles and both insured (inszm_assets)
// and short-term deposits (stdep_assets). This is the final model used to extract coefficients
// for cost calculation.
quietly reghdfe net_nonintexp_assets uninszm_assets i.sizeq#c.(inszm_assets stdep_assets) foreigndep_assets obm_assets equity_assets tradingliabilities_assets otherliabilities_assets loans_assets tradingassets_assets otherassets_assets i.sizeq if yq>=yq(2015,4) & yq<=yq(2019,4), cluster(rssdid) absorb(yq)
estimates store reg4 // Store results for Model 4
estimates save "$path_temp/deposit_cost_reg4${ext_suffix}.ster", replace
estimates drop reg4 // Drop after saving; coefficients for the next step are preserved in e()

// Save the stored estimation results to individual files for use in table generation
display "--- Deposit cost regression results saved to individual files in $path_temp ---"

//===============================================================================
// Step 3: Generate Insured and Uninsured Deposit Costs
//===============================================================================
//Purpose: Calculate bank-specific insured (cost_ins) and uninsured (cost_unins)
//deposit costs using the coefficients from the last estimated regression (Model 4).
//The calculation is performed separately for each size quartile (sizeq).
//Note: The formulas used for cost_ins and cost_unins have different denominators,
//reflecting the specific components included in each calculation based on the regression terms.

gen cost_ins=. // Initialize variable for insured deposit cost
gen cost_unins=. // Initialize variable for uninsured deposit cost

// Loop through each size quartile (1 to 4) to apply size-specific coefficients
forvalues i=1/4{

    // Calculate insured cost for banks in size quartile `i`.
    // Formula: (Coef_Size`i'_Insured * Share_Insured) + (Coef_Size`i'_STDep * Share_STDep)
    // where Share_Insured = inszm_assets / (inszm_assets + stdep_assets)
    // and Share_STDep = stdep_assets / (inszm_assets + stdep_assets)
    // _b[`i'.sizeq#c.inszm_assets] is the coefficient for the interaction of size quartile `i` and inszm_assets.
    // _b[`i'.sizeq#c.stdep_assets] is the coefficient for the interaction of size quartile `i` and stdep_assets.
    replace cost_ins = _b[`i'.sizeq#c.inszm_assets]*(inszm_assets/(inszm_assets+stdep_assets)) + _b[`i'.sizeq#c.stdep_assets]*(stdep_assets/(inszm_assets+stdep_assets)) if sizeq==`i'

    // Calculate uninsured cost for banks in size quartile `i`.
    // Formula: Coef_Uninsured * (uninszm_assets / (uninszm_assets + ltdep_assets))
    // _b[uninszm_assets] is the coefficient for uninszm_assets (note: this coefficient is not interacted with size in Model 4).
    replace cost_unins = _b[uninszm_assets]*(uninszm_assets/(uninszm_assets+ltdep_assets)) if sizeq==`i'
}

// Calculate total deposit cost as a weighted average of insured and uninsured costs.
// The weight for uninsured cost is the uninsured share of domestic deposits (uninsuredsh_domdep).
// The weight for insured cost is (1 - uninsuredsh_domdep).
gen cost_tot = cost_ins*(1-uninsuredsh_domdep) + cost_unins*uninsuredsh_domdep


//===============================================================================
// Step 4: Handle Missing Cost Values
//===============================================================================
//Purpose: Replace any missing values in the calculated cost variables
//(cost_ins, cost_unins, cost_tot) with the sample mean of the respective variable.

// Calculate the mean of cost_ins and replace missing values
quietly sum cost_ins
replace cost_ins=r(mean) if mi(cost_ins)

// Calculate the mean of cost_unins and replace missing values
quietly sum cost_unins
replace cost_unins=r(mean) if mi(cost_unins)

// Calculate the mean of cost_tot and replace missing values
quietly sum cost_tot
replace cost_tot=r(mean) if mi(cost_tot)

//===============================================================================
// Step 5: Create Long Format Panel and Save Data
//===============================================================================
// Convert quarter strings to dates for filtering (matching 2a_deposit_betas.do)
local dt_2021q4 = tq(2021q4) // dec2021 period end
local dt_2022q4 = tq(2022q4) // feb2023 period end
local dt_2023q4 = tq(2023q4) // feb2024 period end

// Keep only observations corresponding to the end quarters of the defined periods
keep if yq == `dt_2021q4' | yq == `dt_2022q4' | yq == `dt_2023q4'

// Create a string variable to identify the period for each observation
gen period = ""
replace period = "dec2021" if yq == `dt_2021q4'
replace period = "feb2023" if yq == `dt_2022q4'
replace period = "feb2024" if yq == `dt_2023q4'

// Keep relevant variables for the long format output dataset
keep rssdid yq period cost_* sizeq*

// Sort the final dataset
sort rssdid period

// Save the dataset to the temporary directory
save "$path_temp/deposit_costs${ext_suffix}.dta", replace

display "--- Deposit cost estimation completed ---" // Indicate the completion of the script
